Consumption of fast food is a key aspect of eating health nowadays. Generally, for a long period, fast food has been related with bad eating habits and bad physical health. Using BMI as the parameter, we are going to measure the correlation between eating health and the consumption of fast food. Accompanied with the fast food consumption, the frequency of exercise also is incorporated in the regression model for exploring its potential interaction with BMI under the effect of fast food consumption.
raw_eathealth = read_csv("ehresp_2014.csv") %>%
mutate(
tucaseid = as.factor(tucaseid)
) %>%
select(tucaseid, starts_with(c("er", "eu")), -erhhch)
internal_indicators = raw_eathealth %>%
select(erbmi, eufastfdfrq, euexfreq) %>%
filter_all(all_vars(. > 0))
fig_data = internal_indicators %>%
pivot_longer(
eufastfdfrq : euexfreq,
names_to = "indicator",
values_to = "value"
) %>%
mutate(
indicator = recode(indicator, "eufastfdfrq" = "Fast Food Consumption",
"euexfreq" = "Exercise"),
indicator = factor(indicator, levels = c("Fast Food Consumption", "Exercise")))
mlr_plot = fig_data %>%
lm(erbmi ~ value * indicator, data = .)
fig_data %>%
mutate(
fit_value = fitted(mlr_plot)
) %>%
plot_ly(x = ~ value, y = ~erbmi, frame = ~ indicator, color = ~ indicator, type = "scatter") %>%
add_lines(x = ~ value, y = ~ fit_value, frame = ~ indicator, mode = "lines") %>%
layout(title = "Scatter plots of BMI vs. Frequency of Fast Food Consumption and Exercise",
xaxis = list(title = "Frequency"), yaxis = list(title = "BMI")) %>%
animation_opts(2500, easing = "linear", redraw = T) %>%
animation_slider(hide = F)
corr_eff = as.data.frame(round(cor(internal_indicators), 4))
From the scatter plot of BMI vs. Frequency of Fast Food Consumption, originally continuous variables are shown approximately to categorical variables, which might be caused by the way of data collection: Questionnaire. Generally, the regression line of BMI vs. frequency of fast food consumption is positive. As the consumption of fast food increases, the value of BMI would also increase, indicating that fast food consumption relates to decreased eating health.
Same problem occurred on collecting data. The regression line of BMI vs. frequency of exercise is negative, which means higher frequency of exercise would result in lower BMI. With different signs of coefficient, the interaction between fast food consumption and exercise did exist with correlation coefficient of -0.0097.
Let’s perform multi-regression model for two predictors on BMI.
mlr_model = lm(erbmi ~ eufastfdfrq + euexfreq, data = internal_indicators)
summ_mlr = summary(mlr_model)
par(mfrow = c(2, 2))
plot(mlr_model)
Multiregression model of BMI vs. Frequency of Fast Food Consumption and Exercise showed that the estimated coefficient for fast food consumption is 0.1025 with p.value of 0.0125and for exercise is -0.1885 with p.value of 1.326074310^{-7}. R-squared: 0.0085
Both p.value prove the statistical significance of two coefficients in the multiregression model. Low R-squared value allows more residual and outlier tests for the model. Obviously, residuals for the model are deviated from normal distribution. Both scatter plots and Leverage plots show that there are multiple outliers resided in the model, affecting the fitted model for the regression of BMI vs. frequency of food consumption and exercise.
Adjustments are performed by controlling existed outliers.
adjust_model = internal_indicators %>%
mutate(
rstand = rstandard(mlr_model)
)
outliers = adjust_model %>%
filter(abs(rstand) > 2.5)
mlr_adjust = adjust_model %>%
filter(abs(rstand) < 2.5) %>%
lm(erbmi ~ eufastfdfrq + euexfreq, data = .)
mlr_adjust_summ = summary(mlr_adjust)
89 outliers are excluded by “internally studentized residuals”. Comparing with unadjusted model, adjusted multiregression model with exclusion of outliers showed lower estimated slope of 0.0922for linear regression BMI vs. Frequency of food consumption with higher statistical significance proved from lower p.value of 0.0095. Regression of BMI vs. exercise presented higher estimated slope of -0.232211. R-squared: 0.0153 with increment of 0.0068, Adjusted R-squared: 0.0148 with increment of 0.0068
par(mfrow = c(2,2))
plot(mlr_adjust)